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Abstract —This paper deals with an abstraction of a uni¬ 
fied problem of drug discovery and pathogen identification. 
Pathogen identification involves identification of disease-causing 
biomolecules. Drug discovery involves finding chemical com¬ 
pounds, called lead compounds, that bind to pathogenic pro¬ 
teins and eventually inhibit the function of the protein. In 
this paper, the lead compounds are abstracted as inhibitors, 
pathogenic proteins as defectives, and the mixture of “ineffective” 
chemical compounds and non-pathogenic proteins as normal 
items. A defective could be immune to the presence of an 
inhibitor in a test. So, a test containing a defective is positive 
iff it does not contain its “associated” inhibitor. The goal of 
this paper is to identify the defectives, inhibitors, and their 
“associations” with high probability, or in other words, learn 
the Immune Defectives Graph (IDG) efficiently through group 
tests. We propose a probabilistic non-adaptive pooling design, 
a probabilistic two-stage adaptive pooling design and decoding 
algorithms for learning the IDG. For the two-stage adaptive- 
pooling design, we show that the sample complexity of the 
number of tests required to guarantee recovery of the inhibitors, 
defectives, and their associations with high probability, i.e., the 
upper bound, exceeds the proposed lower bound by a logarithmic 
multiplicative factor in the number of items. To be precise, lower 
and upper bounds of Q ((r + d) log n + rd) and 0(rd\ogn) 
tests respectively are identified for classifying r inhibitors and d 
defectives amongst n items, and their associations. For the non- 
adaptive pooling design, we show that the upper bound (given 
by 0((r + d) 2 logn) tests) exceeds the proposed lower bound 
(given by min jfi ((r + d) logn + rd), Q (^ logn) , fi (d 2 )) 
tests) by at most a logarithmic multiplicative factor in the number 
of items. 


I. Introduction 

Preliminary stages of drug discovery involve finding 
‘blocker’ or ‘lead’ compounds that bind to a biomolecular 
target, which is a disease causing pathogenic protein, in order 
to inhibit the function of the protein. Such compounds are later 
used to produce new drugs. These lead compounds have to be 
identified amidst billions of chemical compounds 0, (2), and 
hence drug discovery is a tedious process. A complementary 
problem involves identifying pathogenic proteins amidst non- 
pathogenic ones, both of which are structurally identical in 
some respects. For instance, out of five known species of 
ebolavirus, only four of them are pathogenic to humans (see 
p. 5 in [2]) and a similar example can be found in arenavirus 
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(3). Some of these pathogenic proteins might share a common 
inhibitory mechanism against a lead compound which serves 
to distinguish them from the non-pathogenic ones (3). So, 
finding potential pathogenic proteins amidst a large collection 
of biomolecules by testing them against known inhibitory 
compounds is a problem complementary to the problem of lead 
compound discovery. The lead compounds can be abstracted as 
inhibitor items, the pathogenic proteins as defective items, and 
the others as normal items. Now, the above problems can be 
combined to be viewed as an inhibitor-defective classification 
problem on the mixture of pathogenic and non-pathogenic 
proteins, and billions of chemical compounds. This unifies the 
process of finding both the pathogenic proteins and the lead 
compounds. An efficient means of solving this problem could 
potentially be applied in high-throughput screening for drugs 
and pathogens or computer-assisted drug and pathogen identi¬ 
fication. A natural consideration is that, while some pathogenic 
proteins might be inhibited by some lead compounds, other 
pathogenic proteins might be immune to some of these lead 
compounds present in the mixture of items. In other words, 
each defective item is possibly immune to the presence of 
some inhibitor items so that its expression cannot be prevented 
by the presence of those inhibitors when tested together. By 
definition, an inhibitor inhibits at least one defective. Learning 
this inhibitor-defective interaction as well as classifying the 
inhibitors and defectives efficiently through group testing is 
presented this work. 

A representation of this model, which we refer to as the 
Immune-Defectives Graph (IDG) model, is given in Fig. [T] 
The presence of a directed edge between a pair of vertices 
(wi k , Wj k2 ) represents the inhibition of the defective Wj k2 by 
the inhibitor Wi ki and the absence of a directed edge between 
a pair of vertices {wi k , w j k2 ) indicates that the inhibitor Wi k 
does not affect the expression of the defective Wj k2 when 
tested together. A formal presentation of the IDG model and 
the goals of this paper appear in the next section. 

Example 1: An instance of the IDG model is given in Fig. 
[2] In this example, the outcome of a test is positive iff a 
defective Wj k2 , for some & 2 , is present in the test and its 
associated inhibitor wi k2 does not appear in the test. Observe 
that if the item-pair {wi k , Wj k2 ), for k\ ^ k^, appears in a 
test and Wi k2 does not appear in the test, then the outcome 
is positive. Also, if the item-pair (wi k , Wj k2 ) appears in a 
test and if Wj , also appears in the test but not Wi ,, then 
the test outcome is positive. But if the appearance of every 
defective Wj k , in a test is compensated by the appearance of 
its associated inhibitor wi k , in the test, then the test outcome 
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Fig. 1: A representation of the IDG Model, where X represents 
the set of inhibitors and V represents the set of defectives. 

is negative. The outcome of a test is also negative when none 
of the defectives appear in a test. 


x v 



Fig. 2: An example for the IDG Model where each defective 
is associated with a distinct inhibitor so that r = d. 

The IDG model can also be viewed as a generalization 
of the 1-inhibitor model introduced by Farach et al. in 
This model was motivated by errors in blood testing where 
blocker compounds (i.e., inhibitors) block the expression of 
defectives in a test (5). This is also motivated by drug dis¬ 
covery applications where the inhibitors are actually desirable 
items that inhibit the pathogens In the 1-inhibitor model, 
a test outcome is positive iff there is at least one defective and 
no inhibitors in the test. So, the presence of a single inhibitor 
is sufficient to ensure that the test outcome is negative. 

Efficient testing involves pooling different items together 
in every test so that the number of tests can be minimized 
|7|. Such a testing methodology is called group testing. The 
pooling methodology can be of two kinds, namely non- 
adaptive and adaptive pooling designs. In non-adaptive pooling 
designs, any pool constructed for testing is independent of 


the previous test outcomes, while in adaptive pooling designs, 
some constructed pools might depend on the previous test 
outcomes. A /c-stage adaptive pooling design is comprised 
of pool construction and testing in k- stages, where the pools 
constructed for (non-adaptive) testing in the k th stage depend 
on the outcomes in the previous stages. While adaptive group 
testing requires lesser number of tests than non-adaptive 
group testing, the latter inherently supports parallel testing 
of multiple pools. Thus, non-adaptive group testing is more 
economical (because it allows for automation) as well as 
saves time (because the pools can be prepared all at once) 
which are of concern in library screening applications |[8) . The 
1-inhibitor model has been extensively studied, and several 
adaptive and non-adaptive pooling designs for classification 
of the inhibitors and the defectives are known (refer, j9j- 
(12)). A detailed survey of known non-adaptive and adaptive 
pooling designs for the 1-inhibitor model is given in (13) . 
The best (in terms of number of tests) known non-adaptive 
pooling design that guarantees high probability classification 
of the inhibitors and defectives is proposed in (13) . The non- 
adaptive pooling design proposed in U3| requires O(dlogn) 
tests in the r = 0(d ) regime and O ^ log n^ tests in 
the d = o(r ) regime to guarantee classification of both the 
inhibitors and defectives with high probability^ In the small 
inhibitor, i.e., r = O(d) regime, the upper bound on the 
number of tests matches with the lower bound while in the 
large inhibitor, i.e., d = o(r) regime, the upper bound exceeds 
the lower bound of O ^ dl 1 og r log n)j by a log § multiplicative 
factor. Nonetheless, the 1-inhibitor model constrains that every 
inhibitor must inhibit every defective, which is likely to be a 
tight requirement in practice. So, the IDG model is a more 
practical variant of the 1-inhibitor model. 

A formal presentation of the IDG model and the goals of 
this paper are given in the next section. 

Notations: The Bernoulli distribution with parameter p is 
denoted by B(p), where p denotes the probability of the 
Bernoulli random variable taking a value of one. The set of 
binary numbers is denoted by B. Matrices are indicated by 
boldface uppercase letters and vectors by boldface lowercase 
letters. The row-i, column- j entry of a matrix M is denoted 
by Mand the coordinate-i of a vector y is denoted by 
y (i). All the logarithms in this paper are taken to the base 
two. The probability of an event £ is denoted by Pr{£ }. The 
notation /(n) ~ g(n) represents approximation of a function 
/(n) by g(n). Mathematically, the approximation denotes that 
for every e > 0, there exists no such that for all n > no, 


II. The IDG Model 

Consider a set of items W indexed as wi , • • • , w n com¬ 
prised of r inhibitors, d defectives, and n — r — d normal 
items. It is assumed throughout the paper that r, d = o(n). 

Definition 1: An item pair (wi, Wj ), for i j, is said to be 
associated when the inhibitor Wi inhibits the expression of the 

1 The number of inhibitors, defectives and normal items are denoted by r, 
d, and n — d — r respectively. 
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defective Wj. An item pair ( Wi,Wj ), for i j, is said to be 
non-associated if either the inhibitor Wi does not inhibit the 
expression of the defective Wj or if wi is not an inhibitor or 
if Wj is not a defective. 

In general, the mention of an item pair (wi , Wj ) need not mean 
that Wi is an inhibitor and Wj is a defective. This is understood 
from the context. 

Definition 2: An association graph is a left to right directed 
bipartite graph B = (X, £>,£), where the set of vertices (on 
the left hand side) X = {w ix , Wi 27 • • • , Wi r } C W denotes the 
set of inhibitors, the set of vertices (on the right hand side) 
V = {wj 1 , Wj 21 • • • , Wj d } C W denotes the set of defectives, 
and £ is a collection of directed edges from X to V. A directed 
edgee = ( w t ,w k ) E S, for i £ {i\, ■ ■ • ,i r },j e {ji, ■ ■ ■ ,jd}, 
denotes that the inhibitor Wi inhibits the expression of the 
defective Wk- 

We refer to £ (X, V) conditioned on the sets (X, V) to be 
the association pattern on (X, V) . 

A pooling design is denoted by a test matrix M G B Txn , 
where the j th item appears in the i th test iff M(z, j) = 1. A test 
outcome is positive iff the test contains at least one defective 
without any of its associated inhibitors. A positive outcome is 
denoted by one and a negative outcome by zero. 

It is assumed throughout the paper that the defectives are 
not mutually obscuring, i.e., a defective does not function as 
an inhibitor for some other defective. In other words, the set 
of inhibitors X and the set of defectives V are disjoint. 

The goal of this paper is to identify the association graph, 
or in informal terms, learn the IDG. Thus, the objectives are 
two-fold as represented by Fig. [3] 

1) Identify all the defectives. 

2) Identify all the inhibitors and also their association 


X =? V =? 




Fig. 3: Here, the presence of a directed arrow represents an 
association between an inhibitor and a defective. The problem 
statement is to identify the set of inhibitors X, defectives V 
and the association pattern £(X,T>). 

This problem is further mathematically formulated as follows. 
Denote the actual set of inhibitors, normal items, and defec¬ 
tives by X, Af, and V respectively so that X U Af U V = W. 
The actual association pattern between the actual inhibitor and 


defective sets is represented by £(X,T>). Let X, J\f, V, and 
£(X,T>) denote the declared set of inhibitors, normal items, 
defectives, and declared association pattern between (X, V) 
respectively. The target is to meet the following error metric. 

^max Pr { (l, V, £ (±, X>) ) ^ (X, V, £(X, V)) } < cn 5 , 

( 1 ) 


for some constants c, 6 > 0. We propose pooling designs and 
decoding algorithms, and lower bounds on the number of tests 
required to satisfy the above error metric. It is assumed that 
the defective and the inhibitor sets are distributed uniformly 
across the items, i.e., the probability that any given set of 
r + d items constitutes all the defectives and inhibitors is given 
by • It is also assumed that the association pattern 

£ (X, V) is uniformly distributed over all possible association 
patterns on (X, V) . 

We consider two variants of the IDG model. The first being 
the case where the maximum number of inhibitors that can 
inhibit any defective, given by I max, is known. We refer to this 
model as the IDG with side information (IDG-WSI) model. 
For example, Fig. [^represents a case where I max = 1. While 
it is known that I max = 1, it is unknown which among the 
items wi, • • * ,w n represent which inhibitors and defectives. 
For a given value of (r, d) , not all positive integer values of 
Imax < r might be feasible. For instance, if (r, d) = (3,2), 
then Imax = 1 is not feasible because, by definition, each 
inhibitor is associated with at least one defective. So, in the 
IDG-WSI model, we assume that the given value of Imax is 
feasible for the (r, d) tuple. In particular, if (c — l)d < r < cd 
for some integer c > 1, then I max > c. This immediately 
follows from the fact that each inhibitor must be associated 
with at least one defective. 


The other variant of the IDG model we consider in this 
paper is the case where there is no side information about 
the inhibitor-defective associations, which means that each 
defective can be inhibited by as many as r inhibitors. We 
refer to this model as the IDG-No Side Information (IDG- 
NSI) model. For both the models, the goals (as stated in the 
beginning of this section) are the same. 

The contributions of this paper for the IDG models are 
summarized below. 


• The sample complexity of the number of tests sufficient 
to recover the association graph while satisfying the error 
metric Q using the proposed 

- non-adaptive pooling design is given by 

T na = O ((r + d) 2 logn) and T NA = 

O ((/ max + d) 2 logn) tests for the IDG-NSI 
and IDG-WSI models respectively (Theorem [T] 
Section IP- 

- two-stage adaptive pooling design is given by T A = 
O (rdlogn) and T A = O (/ max <ilogn) tests for the 
IDG-NSI and IDG-WSI models respectively (Theo¬ 
rem [2j Section p. 

• In Section ||V] (Theorem [4] and Theorem [5|>, lower bounds 
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of 

max |f2 ((r + d ) logn + rd ), $7 — logn^ , Q,(d 2 ) j>, 

max jf2 ((r + d) log n + , $7 ( ^ maa: log n) , 

l \ Imax J 

fl(d 2 )} 

are obtained for non-adaptive pooling designs for the 
IDG-NSI and IDG-WSI models respectively. The first 
lower bounds for both the models are valid for adaptive 
pooling designs also. The third lower bound for the IDG- 
WSI model is valid under some mild restrictions on Imax 
and r, the details of which are given in Theorem [5] 

The pooling design matrix M constructed in this paper use 
carefully chosen “random matrices”, i.e., the entries of the 
matrices are chosen independently from a suitable Bernoulli 
distribution. Such matrices are known to permit ease of analy¬ 
sis 0. Notwithstanding the simplicity of the pooling design 
construction, figuring out a good decoding algorithm with a 
reasonable computational complexity and good lower bounds, 
especially for non-adaptive pooling designs, is a challenging 
task. The goodness of the pooling design, decoding algorithm 
tuple and the proposed lower bounds is measured in terms 
of the closeness of the upper bounds to the lower bounds 
on the number of tests. For non-adaptive pooling designs, 
this can be observed from Table [I] For the proposed adaptive 
pooling design, the upper bound exceeds the lower bound by 
at most a logn multiplicative factor for both IDG-NSI and 
IDG-WSI models. Also, the proposed decoding algorithms 
have a computational complexity of 0(tiTna ) and O(nT^) 
time units for the non-adaptive and adaptive pooling designs, 
respectively. This intuitively means that an item is “processed” 
at most a constant number of times per test. 

Extension of the results on the upper and lower bounds on 
the number of tests to the case where only upper bounds on the 
number of inhibitors (given by R) and defectives (given by D) 
are known instead of their exact numbers is straightforward. 
The target error metric in 0 is re-formulated as maximum 
error probability criterion over all combinations of number of 
inhibitors and defectives. The results for this case follow by 
replacing r by R and d by D in the upper and lower bounds 
on the number of tests. 

There are various generalizations of the 1-inhibitor model 
considered in the literature. These models are summarized in 
the following sub-section to show that the model considered in 
this paper, to the best of our knowledge, has not been studied 
in the literature. 

A. Prior Works 

The 1-inhibitor model can be generalized in various direc¬ 
tions, mostly influenced by generalizations of the classical 
group testing model. The various generalizations are listed 
below and briefly described. Though none of these generaliza¬ 
tions include the model studied in this paper, it is worthwhile 
to understand the differences between these models and the 
IDG model. 


A generalization of the 1-inhibitor model, namely k- 
inhibitor model was introduced in 0 In the k -inhibitor 
model, an outcome is positive iff a test contains at least one 
defective and no more than k — 1 inhibitors. So, the number 
of inhibitors must be no less than a certain threshold k to 
cancel the effect of any defective. This model is different 
from the model introduced in this paper because, in the IDG 
model, a single associated inhibitor is enough to cancel the 
effect of a defective. Further, none of the inhibitors might be 
able to cancel the effect of a defective because the defective 
might not be associated with any inhibitor. A model loosely 
related with the 1-inhibitor model, namely mutually obscuring 
defectives model was introduced in (T6) . Here, it was assumed 
that multiple defectives could cancel the effect of each other, 
and hence the outcome of a test containing multiple defectives 
could be negative. Thus, a defective can also function as a 
inhibitor. However, in this paper, the sets of defectives and 
inhibitors are assumed to be disjoint. The threshold (classical) 
group testing model is where a test outcome is positive if 
the test contains at least u defectives, negative if it contains 
no more than l defectives and arbitrarily positive or negative 
otherwise 0- This model was combined with the ^-inhibitor 
model and non-adaptive pooling designs for the resulting 
model was proposed in (18) . 

A non-adaptive pooling design for the general inhibitor 
model was proposed in [19j. Here, the goal was to identify 
all the defectives with no prior assumption on the cancellation 
effect of the inhibitors on the defectives, i.e, the underlying 
unknown inhibitor model could be a 1-inhibitor, ^-inhibitor 
model, or even the ID model introduced in this paper. How¬ 
ever, the difference from our work is that, we aim to identify 
the association graph or, in other words, the cancellation effect 
of the inhibitors also apart from identification of the defectives. 
But this cancellation effect does not include the ^-inhibitor 
model cancellation effect as noted earlier. Group testing on 
complex model was introduced in |20) . In the complex model, 
a test outcome is positive iff the test contains at least one of 
the defective sets. So, here the notion of defectives items is 
generalized to sets of defective items called defective sets. 
This complex model was combined with the general inhibitor 
model and non-adaptive pooling designs for identification of 
defectives was proposed in 0. Our work is different from 
0 for the same reasons as stated for (T9). Group testing on 
bipartite graphs was proposed in (22) as a special case of the 
complex model. Here, the left hand side of the bipartite graph 
represents the bait proteins and the right hand side represents 
the prey proteins. It is known a priori which items are baits 
and which ones are preys. The edges in the bipartite graph 
represent associations between the baits and preys. A test 
outcome is positive iff the test contains associated items and 
the goal was to identify these associations. Clearly, this model 
is different from the IDG because, in the IDG model, there 
are three types of items involved and the interactions between 
the three types of items are different from that in (22). 

In the next section, we propose a probabilistic non-adaptive 
and a probabilistic two-stage adaptive pooling design and 
decoding algorithms for both the variants of the IDG model 
discussed this section. 
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TABLE I: Necessary and sufficient number of tests for various regimes of the number of inhibitors, defectives, and Imax are given. In the 
large inhibitor regime, i.e., d — 0(r) for the IDG-NSI model and d = O(Imax) for the IDG-WSI model, the upper bounds exceed the 
lower bounds by multiplicative factors of logr and log Imax for the IDG-NSI and IDG-WSI models respectively. In the small inhibitor 
regime, i.e., r = o(d) for the IDG-NSI model and Imax = o(d) for the IDG-WSI model, the upper bounds exceed the lower bounds by 
multiplicative factors of logn for both IDG-NSI and IDG-WSI models. 


Model 

d = 0(r),d = O(Imax) (large inhibitor regime) 

r = o(d), Imax = o(d) (small inhibitor regime) 

IDG-WSI 

Upper Bound: O (r 2 log n) 

Lower Bound: ^ log 

Upper Bound: 0(d 2 logn) 

Lower Bound: Q(d 2 ) 

IDG-NSI 

Upper Bound: O {I'^ ax logn) 

Lower Bound: rr } ax logn^ 

Upper Bound: 0(d 2 log n) 

Lower Bound: Q(d 2 ) 


III. Pooling designs and Decoding Algorithm 


In this section, we propose a non-adaptive pooling design 
and decoding algorithm as well as a two-stage adaptive pooling 
design and decoding algorithm for the IDG-WSI Model. 
The pooling designs and decoding algorithms for the IDG- 
NSI model follows from those for the IDG-WSI Model by 
replacing I max by r. 

Non-adaptive pooling design: The pools are generated from 
the matrix M na £ M TnaXu . The entries of Mat a are i.i.d. as 
S(pi). Test the pools denoted by the rows of MjVA- Let the 
outcome vector be given by y G M Tn - a x 1 . The exact value of 
T na is specified in (LI > and (l 2 | ) (where Tna = /3/vAlogn) 
in Sub-section |III-A| and its scaling is given in The orem [T] 
(which appears before the beginning of Sub-section Ill-Ai r 
The exact value of pi is also given in Theorem [T] 

Adaptive pooling design: A set of pools are generated from 
the matrix Mi G B TlXn whose entries are i.i.d. as B(pi). 
The pools denoted by the rows of Mi are tested first and all 
the defectives are classified from the outcome vector yi G 
B TlXl . Denote the number of items declared defectives by 
d and the set of declared defectives by {u\,U 2 , • • • , u^\. If 
d 7 ^ d, an error is declared. We keep these declared defectives 
aside and generate another pooling matrix M 2 G B T2X ( n_c 0, 
whose entries are i.i.d. as B(p 2 ), f° r the rest of the items. Now, 
test the pools denoted by the rows of the matrix M 2 along 
with each of the items declared defectives and the outcomes 
are denoted by ,y u d G B TaXl . The two stages 

of testing are done non-adaptively as represented in Fig. |4j 
and hence the pooling scheme is a two-stage adaptive pooling 
design. The exact values of pi and P 2 are given in Theorem 
[2] (which appears before the beginning of Sub-section |III-A| ). 
The scaling of T\ and T 2 are also given in Theorem [2] and their 
exact values are given in © and (\3\ (where, T* = fy logn). 
The total number of tests is given by + dT 2 . 

The defectives are expected to participate in a higher 
fraction of positive outcome tests than the normal items or 
the inhibitors. And, once the defectives are identified, tests 
of each one of them with rest of the items can be used to 
determine their associations. We show that this can be done 
non-adaptively as well. The decoding algorithm proceeds in 
two steps for both non-adaptive and adaptive pooling design. 
The first step will identify the defectives from the outcome 
vectors y and yi in the non-adaptive and adaptive pooling 
designs respectively, according to the fraction of positive 


m 2 


Mi 


Test and identify 
defectives from yi 


* Ul - 
► U2 - 

' Ud " 


-► (5 -*■ y*2 


t: 




^yud-i 

■y u d 


Fig. 4: The proposed two-stage adaptive pooling design 
scheme is demonstrated here. The symbol ® indicates that 
the pooling matrix M 2 is tested along with the items Ui which 
are declared defectives. The items non-associated with Ui are 
determined from the outcome vector y u ., for i = 1 , 2 , • • • , d. 


outcome tests in which an item participates. The second step 
will identify the inhibitors and their associations with the 
declared defectives using subsets of the outcome vector y 
in the non-adaptive pooling design and the outcome vectors 
ywi, yn 2 r • • > Yu d in the adaptive pooling design. 

Let us define the following notation^] with respect to the 
pools represented by M^a and Mi which are eventually 
useful in characterizing the statistics of the different types of 
items that are used in the decoding algorithm. 

Notations: 

• X(u) denotes the set of inhibitors that the defective u is 
associated with. 

• &u k denotes the event that none of the inhibitors associ¬ 
ated with a defective Uk appears in a test, given that the 
defective Uk appears in the test. 

• C V({ui, • • • , u d }) denotes the j th -set in the (arbi¬ 
trarily) ordered set of all Atuple subsets of the defective 
set denoted by Vi, for j = 1 , • • • , (^), where Ui denotes 
a defective and V{(ui, • • • ,u d )} denotes the power set 
of the set of defectives. 

• V(s) denotes the defectives associated with the inhibitor 

2 From hereon, we reserve the notation u to represent a defective, v to 
represent a normal item and s to represent an inhibitor. 
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s and its complement is given by V(s) =T> — V(s). 

• V{s) i denotes the (arbitrarily) ordered set of all i-tuple 
subsets of the defective set V(s) and the j th -set in V(s) • 

is denoted by V(s)^\ 

Example 2: Realizations of the above notations for the 
association graph in Fig. [2] considered in Example [I] are given 
below. The inhibitor set is given by X = {si, * • • , s r } C W 
and the defective set is given by V = {ui, • • • , Ud} C W with 
r = d. An inhibitor Si is associated with a distinct defective 
Ui, and so 


• X(u) for u = Ui is given by T(ui) = {s^}. 

• & Ul represents the event that the inhibitor s\ associated 
with the defective u\ does not appear in a test, given that 
the defective u\ appears in the test. 

• Realizations of T>i for i = 1, 2 are given by 


£*1 ={{wl},{W2},--' ,{«d}}, 

={{U 1 ,U 2 },{U 1 ,U 3 }, - ■ ■ ,{ui,U d }, 

{u 2 ,u 3 },--- ,{u 2 ,U d },--- ,{ u d-l,U d }}. 


Realizations of for (i, j) = (1,2) and (i, j) = (2,3) 
are given by 


r>i 2) = {u 2 },v^ = {ui,u 4 }- 


• V(s) for s = si is given by V(si) = u\ and its 
complement is given by D(si) = {, • • • , Ud}. 

• Realizations of 'D(s) i for s = si and i = 1,2 


^(Sl)! = {{^ 2 }, ' ' ' ,{Wd}}, 

2?(si) 2 = {{«2, W 3 }, {W2, M 4 }, • • • , {«2, Md}, 

{u 3 , U 4 }, • • • , {«3, U d }, • • • , {«d-l, U d }} . 


- (i) 

Realizations of T>(s)\ with s = si, for (i,j) = (1,2) 
and (i,j) = (2,3) are given by 


v(s) { ^ ={u 3 },V(s) { 2 ] = {U 2 ,U 5 }. 


We now define the following statistics corresponding to the 
different types of items. The following statistics also hold good 
when yi is replaced by y, as entries of both M na and Mi 


have the same statistics. 

q(u) a p r |y 1 (/) = l|defective u is present in the / th -test} 

> (1 -pi) ,:r(n)l > (1 -pi) Irnax , (2) 

q^ = Pr {yi(Z) = l|normal item v is present in the Z th -test} 

d (?) 


= £pj(lPr 

3 = 1 


i=1 
d 


U *•») = 


Q2 


(3) 


u k ev 


U) 


;(d 




< E^ 1 = 1 - (1 -Pi) d = 


(4) 


# 3 ^ = Pr {y-| (/) = l|Inhibitor s is present in the / th -test} 

|®W| _ |®Wi| 

= E E Pr ^ U 


i=1 


3 =1 




O') 


(5) 


if 


V(s) 


> 1 , 


= 0, otherwise. 


Since the outer and inner summations in <f5l» is over a subset 
T"1 (s) (v) 1—1 

of those in ( 3b, max — Q 2 - It is also intuitive that 

positive outcome for an inhibitor in a test is less probable than 

that for a normal item. The equality in ([3]) follows from the 

fact that a test outcome is positive iff at least one defective 

appears in the test (which is captured by the outer summation 

term) and none of the inhibitors associated with at least one of 

these defectives appears in the test (which is captured by the 

union of the events & Uk over Uk). A similar explanation holds 

true for 0. The upper bound in ([4]) follows from the upper 

bound of one on the probability terms of 0- In hindsight, the 

lower bound in ^ and the upper bound in ([4]) can be easily 

obtained as follows. The lower bound on the positive outcome 

statistics for a defective item in 0 follows from the worst 

case statistics when all the inhibitors inhibit the expression of 

every defective. The upper bound on the statistics for a normal 

item in 0 follows by using the best case positive outcome 

statistics, in the absence of inhibitors, where the appearance 

of any defective gives a positive test outcome. In the sequel, 

we shall exploit the difference between 0 and 0 to identify 

the defectives notwithstanding the fact the one of them could 

be loose bounds for specific association graphs. For example, 

0 is tight for the 1-inhibitor model whereas 0 could be a 

loose upper bound for the same association graph, depending 

on the values of p , r, and d. However, fortunately, p\ can be 

chosen appropriately so that the looseness in the bounds do not 

affect the scaling of the upper bound on the number of tests 

required to identify the defectives, and the dominant scaling 

is determined by the number of tests required to identify the 

association pattern. 

Denote the worst case negative outcome statistic for a 
defective by 

bmax = 1 - (1 -Pl) lmax . (6) 


Denote the set of tests corresponding to outcome vector y 
in which an item Wj participates by T Wj (y) and the set of 
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positive outcome tests in which the item Wj participates by 
S Wj (y), for j = 1, 2, • • • , n. The decoding algorithm is given 
as follows. 

1) Step 1 (Identifying the defectives for both non-adaptive 
and adaptive pooling designs): 

For the non-adaptive pooling design, if |<S Wi (y)| > 
\T Wj (y )\[1 - bmax{ 1 + t ))] with b max as defined in 
([6]), declare the item Wj to be a defective. For the 
adaptive pooling design, we use the same criterion, 
replacing y by yi. Denote the number of items declared 
as defectives by d and the set of declared defectives by 
{&!, U 2 , * • • , uj}. If d 7 ^ d, declare an error. Denote the 
the remaining unclassified items in the population by 
,w' n _ d } ={wi,--- ,W n } -{«!,••• ,Ud}- 

2) Step 2 (Identifying the inhibitors and their associations 
for non-adaptive pooling design): 

Let Vk denote the sets of pools in Ma ta that contain 
only the declared defective Uk and none of the other 
declared defectives, for k = 1, ••• ,d. Also, let the 
outcomes corresponding to these pools be positive. This 
means that the pools in Vk do not contain any inhibitor 
from the set X(iik), which denotes the set of inhibitors 
associated with the item Uk if Uk is indeed a defective. 
Now, consider only the outcomes corresponding to these 
pools denoted by yp 1 C y, • • • , yp d C y. The associa¬ 
tions of the declared defectives are identified as follows. 

• For each k = 1 to d, declare (w'^Uk) to be a non- 
associated inhibitor-defective pair if w'j participates 
in at least one of the tests corresponding to the 
outcome vector y<p k and declare the rest of the items 
to be associated with Uk- 

The items declared as non-associated for all k are 
declared to be be normal items. If Vk = { 0 } for some 
fc, declare an error. 

3) Step 2 (Identifying the inhibitors and their associations 
for adaptive pooling design): 

Let S(yn k ) denote the set of positive outcome tests 
corresponding to yi.e., these pools do not contain 
any inhibitor from the set X(uk) if Uk is a defective. 

• For each k = 1 to d, declare (w^Uk) to be a non- 
associated inhibitor-defective pair if Wj participates 
in at least one of the tests in the set S(yu k ) and 
declare the rest of the items to be associated with 

Uk- 

The items declared as non-associated for all k are 
declared to be be normal items. If S(yn k ) = { 0 } for 
some fc, declare an error. 

The following toy example demonstrates the operation of 
the above decoding algorithm for non-adaptive pooling design. 

Example 3: Consider the following non-adaptive pooling 
design matrix Mata E ® 5x5 and the outcome vector y E 
® 5xl for the underlying association graph shown in Fig. [ 5 ] The 
item is a normal item. Here, r = d = 2,n = 5, Tata = 5. 


I V 



Fig. 5: The underlying association graph for Example [5] 
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We recall that column-j of the matrix Mata corresponds to the 
item Wj . The threshold for identifying the defectives in Step 1 

of the decoding algorithm is such that any item Wj that satisfies 

\S W ■ (y) | i 

the condition yp J ( y )| > ^ is declared to be a defective. Now, 
observe the operation of the decoding algorithm. 

Step 1: We observe that 

l^i (y) I = 1 1^2 (y) I = 2 1^ (y) I = I 

\r Wl (y)\ 2’|T W2 (y)| 3’rWy)| 2’ 

l<Wy)l = 2 |«s W5 (y)| _ i 

|T W4 (y)| 3’ |T WB (y)| 2' 

Items W 2 and w 4 are the only items that satisfy the condition 
and hence are declared defectives. Therefore, 
the declared defectives are given by iii = W 2 , U 2 = w^ and 
the remaining unclassified items are given by w[ = w \, w' 2 = 

w 3 ,w ' 3 = w 5 . 

Step 2: The “useful” pools used for identifying the “non¬ 
associations” are obtained as V\ = {3}, 7^2 = {4}. This is 
because the third test outcome in which u\ participates and 
U 2 does not participate is positive, and the fourth test outcome 
in which participates and u\ does not participate is also 
positive. Since the items w ' 2 and w ' 3 participate in the third test, 
(w' 2 ,ui) = (^ 3 ,^ 2 ) and (w 3 ,ui) = (^ 5 ,^ 2 ) are declared 
to be non-associated inhibitor-defective pairs and (w[,ui) = 
(^ 1 ,^ 2 ) is declared to be an associated inhibitor-defective 
pair. Similarly, (w' 1: U 2 ) = and (w 3 ,ui) = (w^^wf) 

are declared to be a non-associated item-pairs and (w 2 ^ 2 ) = 
(w 3 ,W4) is declared to be an associated inhibitor-defective 
pair. Since the item w 3 = is declared to be non-associated 
with both ui and U 2 , it is declared to be a normal item. 

We emphasize that this is a toy example to demonstrate 
the operation of the proposed decoding algorithm and not 
representative of the values of p or r or T/va for the given 
values of r,d,n. 

Remark 1: (Step 1) The first step in the decoding algorithm, 
which is the same for both the non-adaptive and adaptive 
pooling design, is similar to the defective classification al¬ 
gorithm used in GD for the 1-inhibitor model. The under¬ 
lying common principle used is that there exists statistical 
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difference between the defective items and the rest of the 
items. Hence, with sufficient number of tests, the defectives 
can be classified by “matching” the tests in which an item 
participates and the positive outcome tests. The items involved 
in a large fraction of positive outcome tests are declared 
to be defectives. A similar decoding algorithm was used in 
the classical group testing framework with noisy tests (23). 
Here, the inhibitors of a defective item, if any, behave like 
a noise due to probabilistic presence in a test. The (worst 
case) expected number of positive outcome tests in which a 
defective participates is at least \T W . (y) | [1 — b max \. Like in 
ED the Chernoff-Hoeffding concentration inequality (24) is 
used to bound the error probability and obtained the exact 
number of tests required to achieve a target (vanishing) error 
probability. It is important to note that, a priori, it is not clear 
if a fixed threshold technique can sieve the defectives under 
worst case positive outcome statistics and the rest of the items 
under best case positive outcome statistics, with vanishing 
error probability. The fact that this is indeed possible will 
be proved in the following sub-section. 

Remark 2: (Step 2) In the IDG model, the inhibitors for 
each defective might be distinct. Hence, an inhibitor for one 
defective behaves as a normal item from the perspective of 
another defective. This defective-specific interaction is absent 
in the 1-inhibitor model. So, any inhibitor can be identified 
using any defective, i.e, an inhibitor’s behaviour is defective- 
invariant in the 1 -inhibitor model, which was exploited in iden¬ 
tifying the inhibitors in ED Since each inhibitor’s behaviour 
can be defective-specific in the IDG model, we need to identify 
the defectives first and then identify its associated inhibitors 
by observing the interaction of the other items with each of 
these defectives. 


The following theorems state the values of the parameters 
Pi, P2, and r, and the scaling of the number of tests required 
for the proposed non-adaptive and adaptive pooling designs to 
determine the association graph with high probability. Similar 
results can be stated for the IDG-NSI model by replacing I max 
by r in the following theorems. 


Theorem 1 (Non-adaptive pooling design): Choose the 
pooling design matrix M n A of size Tna x ^ with its entries 
chosen i.i.d. as B(pi) with p\ = ~ (T 1 , ~ for the IDG-WSI 
model. Test the pools denoted by the rows of the matrix 
M na non-adaptively. The scaling of the number of tests 
sufficient to guarantee vanishing error probability 0 using 


1 —br, 


-a UB 
12 


the proposed decoding algorithm with r = - 

^Omax 

given by T NA = O ({I ma x + d) 2 logn), where 
bmax are defined in ([4]) and ([ 6 ) respectively. 


is 


and 


Theorem 2 (Adaptive pooling design): Choose the pooling 
design matrices Mi and M 2 of sizes T\ x n and X 2 x n with 
its entries chosen i.i.d. as i.i.d. B(p\) and B(pf) respectively, 
with pi = o jj 1 and P 2 = 7 ^— for the IDG-WSI 
model. Test the pools denoted by the rows of the matrices 
Mi non-adaptively and classify the defectives. Now, test each 
of the pools from M 2 along with the d classified defectives 
individually. The scaling of the number of tests sufficient to 
guarantee vanishing error probability ([l} using the proposed 
decoding algorithm with r = 1 ~ b ^ x ~ q 2 — j s gi ven by 


T a = Ti + dT 2 = O (I max d\ogn), where q% B and b max 
are defined in 0 and 0 respectively. 

Remark 3: The value of r = 1 ~ b ^ x ~ q 2 — chosen in 

^Omax 

the above theorems implies that the decoding algorithm de¬ 
clares item Wj to be a defective if , ]f^ 3 'f yi s| > 

J 1 'wj (y)r l / wj (yi)l 

(i-bmax)+q 2 — jins threshold is simply an average between 
the worse-case positive outcome statistic for a defective and 
the best-case positive outcome statistic for a normal item or 
an inhibitor. The values of p and p\ are chosen so that the 
former is greater than the latter. 

The following sub-section constitutes the proof of the above 
theorems. The exact number of tests required to guarantee 
vanishing error probability for recovery of the association 
graph are also obtained. The proof is exactly the same for 
the IDG-NSI model, but replacing I max by r. 


A. Error Analysis of the Proposed Algorithm 
As mentioned in Section [II] we require that 


max Pr 

Z,V,£(Z,V) 


{(i,v,s(i,v)) (x,v,i(x,v)} < 


cn 


-s 


for some constant c > 0 and fixed S > 0. For the non-adaptive 
pooling design, we find the number of tests T^a required 
to upper bound the error probability of the first step of the 
decoding algorithm by cin~ Sl and that of the second step 
of the decoding algorithm by C 2 n~ 62 , for some constants c\ 
and C 2 . A similar approach is taken for the two-stage adaptive 
pooling design to find the number of tests T\ and the value of 
T 2 . Finally, the values of Si and S 2 are chosen so that the total 
error probability is upper bounded by cn~ 6 , for some constant 
c and given S > 0. 

1) Error Analysis of the First Step: Since the first step of 
the decoding algorithm is the same for both the non-adaptive 
and adaptive pooling design, the bounds on the number of 
tests obtained below for adaptive pooling design applies for 
the non-adaptive pooling design also. The three possible error 
events in the first step of the decoding algorithm for both non- 
adaptive and adaptive pooling design are given by 

1) A defective is not declared as one. 

2) A normal item is declared as a defective. 

3) An inhibitor is declared as a defective. 

Clearly, the defective that has the largest probability of a 
negative outcome, given by bi max = max (1 — q[ u ^ J, has 
the largest probability of not being declared as a defective. 
So, with T\ = fii log n, the probability of the first error event 
for all the defectives can be upper bounded (using the union 
bound over all defectives) as 
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Ti 


(? )pi (! — Pi) Tl * 


V=t&maa;(l+T) 




= d E 


pU 1 - pi) 


Ti-t 


£ 

v ^lmax ~l~^(.bmax b\rnax 
Ti 


-\-b rr 


blmaA 1 ~ hmaxY 


< (l 1 )p\e- 2t ^- b ^a^ma X ry\ 1 _ pi) T 1 -t 


(&) w Fi , 

— a 1 - pi + pie 


^(brnax b l TOax +^ 


"•'1 


2 -| log n 


( c ) 

< 

U) 


dexp |— ^ipi logn ^1 — e 2 ^ brnax bl max+ b ™axT) ^|<' 

d exp { / dlPl log Ti (l e ) {bmax max T ^ macc 7~) } 

< n -51 
■ > 


,-*i 


(l^+^i)ln2 


Pl(l e 2 )(bmax bi rnax + bmaxT ) 2 

where (a) follows from Chemoff-Hoeffding bound |240 ( b ) 
follows from binomial expansion, (c) follows from the fact that 


0 —2x z 


> 


1 — c < e c , and (d) follows from the fact that ^1 

(l — e -2 ) x 2 , for 0 < x < 1. Using the fact that 6i ma£c < 
bmax , where 6 ma ^ is defined in the following bound on 
Pi suffices. 


Pi > 




(7) 


Pi{l - e~ 2 ){b max T) 2 ' 

Similarly, to guarantee vanishing probability for the second 
error event (union-bounded over all normal items) and the third 
error event (union-bounded over all inhibitors), it suffices that 


01 > 
01 > 




Pi(l - e- 2 ) (1 - b max ( 1 + r) - g 2 ) 2 ’ 
Pl(l - e -2 ) (l - b max ( 1 + r) - 4 s) ) 


2 ’ 


( 8 ) 


IS 


Since max < q 2 and r = o(n ), the bound in (p 
asymptotically redundant for all values of r. So, substituting 
the upper bound on q 2 defined in Q by q % B , it suffices that 


Pi> 


^M^r 1+Si y n2 


pi(l - e~ 2 ) (1 - b max ( 1 + r) - q% B Y 


(9) 


Now, the value of r chosen to optimize the denominators of 

n n i~b -q ( ub) 

Alb and (9) is given by r = - ™ a h x 2 -. Therefore, we have 

^®max 


Pi > max - 


4 (i^+U ln2 


.Pl(l - e 2 ) (1 - b m ax ~ Q2 B ) 


2 ’ 


( 10 ) 


Pi{l - e“ 2 ) fl - b max - q% B )' 

3 If the term bmax{ 1 + r) > 1, then the probability of the error event under 
consideration is equal to zero. So, it can be assumed that bmax{ 1 + r) < 1. 


The term 1 — b max — q 2 can be lower bounded as follows. 

1 - b max - <h > (1 - Pl) /maa! - (1 - (1 - Pl) d ) 

^1 (Tnax + 00Vl • 


The last lower bound above follows from the fact that 
(1 - Pi y™°-* > (1 - I max pi) and (1 - pi) d > (1 - dpi). 
Optimizing the denominator terms of ( |T()| ) with respect to pi, 
we have pi = Hence, using r, d = o(n) in (10), 


_ . . 3(7 max +d) 

for sufficiently large n it suffices that 


0NA,01 > 


27 (/„ 


<j) (' ! M rI1+i5i ) lii2 

(l-e- 2 ) 


(ID 


where T/va = Ava logn. 

2) .Error Analysis of the Second Step: In the error analysis 
of the second step, we assume that all the defectives have 
been correctly declared. Errors due to error propagation from 
the first step shall be analyzed later. 

Non-adaptive pooling design: 

The only error event for the non-adaptive pooling design in 
the second step is that there does not exist a set of pools 
Vk such that they contain only the defective Uk and none of 
its associated inhibitors X(uk ), and all its non-associated items 
appear in at least one of such pools. Denote this error event by 
% ( Uk ). Clearly, none of the inhibitors associated with Uk will 
be declared as non-associated with Uk- This follows from the 
definition of the set of pools Vk and the decoding algorithm. 

The probability of the favourable event that a non-associated 
item appears along with a defective Uk, but none of its 
associated inhibitors and none of the other defectives appear 
in a pool from M^a is given by 


frOU =pl(l -p 1 )l x ( Ufc )l(l ~Pi) d 


Now, probability of the error event % (uk) is upper bounded 
by 


, Tna 


Pr {W{u k )} <{n-d- \l{u k )\) (l - & (Ufc) )' 

< (ti — d — \X(uk)\)e~ TNAbi k) < n~ S2 , if 


Pna > 


( 


\n(n—d— \X(uk)\) 
In n 


+ ^ 2 ^ in 2 


frO/c) 


Since (1 - > (1 - pi) 1 > (1 - I max Pi) and 

(1 — pi) d_1 > (1 — dpi), substituting for pi, it suffices that 


Pna > —r(Imax + d) 2 f —-- + ^2^) in 2. (12) 

4 \ Inn J 


Adaptive pooling design: 

Like in non-adaptive pooling design the only error event, 
denoted by S > {Uk ), is that items Wj not associated with Uk are 
declared as associated inhibitors, i.e., the item Wk does not 
appear in any of the positive outcome tests S(y Uk ). Clearly, 
none of the inhibitors associated with Uk will be declared as 
non-associated with Uk . 

Let T 2 = p 2 log n. The number of tests required to guar¬ 
antee vanishing error probability for the error event <£{uk) is 
evaluated as follows. Let Wj £ X(uk ). Define 
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a (n fc ) a P r |y Ufc (Z) = 1 | Wj is present in / th -test} 

> (1 — p 2 )\ x( < Uk )\ = a^ Uk \ 


Now, we have 

Pr {^(Mfe)} < (n — d — |X(w fc )|) (l - a^p-^j < n 
( ln(ra - d -|f (Mfc)l) +( 5 2 ) In 2 


— 82 


p2 > 


P2 (1-P2) 


X(u fc )| 


2Irr 


Using the fact that (1 — p 2 )l x ( Ufc )l > 1 — | T(uk)\p 2 , and 
substituting p 2 = 

@2 > 4 Irr 

<= &2 > 4 In 


-, we have the following bound. 


In (n-d- \T{u k )\) 


Inn 

In (n — d) 

In n 


+ 62 J In 2 

+ 3 2 ) In 2. (13) 


3) Analysis of Total Error Probability: Assuming that the 
target total error probability is 0(n~ 6 ), the values of Si and S 2 
need to be determined. Towards that end, define the following 
events. 


$ij = Event of declaring (wi,Wj),i j, to be an associated 
pair, 

W = Event that at least one actual defective has not been 


B. Adaptation for the IDG-NSI Model 


The only modification required in the pooling design and 
decoding algorithm proposed for the IDG-WSI model to adapt 
it for the IDG-NSI model is that I max is replaced by r. For 
the sake of clarity, we list the only changes below. 

1) The pooling design parameters are chosen as p = p\ = 

l — J_ 

3 (r+d)’^ 2 2 r m 

2) In Step 1 of the decoding algorithm the threshold for 
identifying the defectives is chosen as \S Wj (yi)| > 
\T Wj (yi)\[l-b max (l+T))}, where b max = l-(l-pi) r . 
Intuitively, this worst-case threshold corresponds to a 
scenario where every inhibitor inhibits every defective, 
i.e., the 1-inhibitor model. 

3) The values of Pna* Pi and / 3 2 are chosen as 


27 


Pna >max 


(r + d){^A 1 +S 1 ) In2 


Pl> 


(1-e- 2 ) 

8 f(.r + df( h A^ + 6 2 ) In2 
4 \ mn J 

27 (r + d) ( lr ‘4~^- r ' > + fri) In 2 


/? 2 >4 r 


(1-e- 2 ) 
In (n — d) 


In n 


+ S 2 In 2. 


declared as a defective. 

Let £ denote the correct association pattern for some realiza¬ 
tion {X, V}. Now, the total probability of error is given by 

Pr | U ^U W \ ^ E M<%) + Pr W 
x V E Pr {<%}+ E E Pr iAj} ( 14 ) 

Wi^Wj WjEAfuX Wi^T(wj) Wj^T> 

+ Vv{W] 

<n x 2n -51 + dn ~ 62 + n _<51 . (15) 

There are two possible ways in which the event for 
(wi,Wj) £ £, can occur. One possibility is that the item 
Wj has been erroneously declared as a defective in the first 
step of the algorithm, and hence any item Wi declared to 
be associated with Wj is an erroneous association. The first 
term in ( [14] ) represents this possibility. The other possibility 
is that Wj has been correctly identified as a defective, but the 
item Wi is erroneously declared to be associated with Wj . The 
second term in ( fl4| ) represents this possibility. The last term 
accounts for the fact that a defective might be missed out in 
the first step of the algorithm. Note that the other two terms 
do not capture this error event. Finally, © follows from the 
error analysis of the first and second steps of the decoding 
algorithm. Therefore, if the target error probability is 0(n~ 6 ), 
then choose Si, S 2 = S + 1. 

Recall that the number of tests required for non-adaptive 
and adaptive pooling designs are given by Tna = Pna log n 
and Ta — T\ + dT 2 = (Pi + dp 2 ) log n respectively. 
Therefore, from ( pTj ), ( pX| ), and ( [13] ) we have that Tna = 

o (( Imax + d ) 2 logn) and T A = 0 ( I max d\ogn ). 


Hence, the total number of tests required for the IDG-NSI 
model scales as Tna = O ((r + d) 2 logn) for the non- 
adaptive pooling design and Ta = 0(rd\ogn) for the two- 
stage adaptive pooling design. 

In the next section, lower bounds on the number of tests for 
non-adaptive and adaptive pooling designs are obtained. 


IV. Lower Bounds for Non-Adaptive and Adaptive 
Pooling Design 

In this section, two lower bounds on the number of tests 
required for non-adaptive pooling designs for solving the IDG- 
NSI and IDG-WSI problems with vanishing error probability 
are obtained. One of the lower bounds is simply obtained 
by counting the entropy in the system and this lower bound 
also holds good for adaptive pooling designs. The other lower 
bound is obtained using a lower bound result for the 1-inhibitor 
model which is stated below. We recall that all the inhibitors 
inhibit the expression of every defective in the 1-inhibitor 
model. 

Theorem 3 (Th. 1, tHl): An asymptotic lower bound on 
the number of tests required for non-adaptive pooling designs 
in order to classify r inhibitors amidst d defectives and 
n — d normal items in the 1-inhibitor model is siven by 
ft ^ d r login the d = o(r),r = o(n) regime 4 

The second lower bound in the following theorem aominates 
in the large inhibitor regime, i.e., the number of inhibitors 
is large compared to the number of defectives. It conveys 


4 Though Theorem 1 in jl3] is stated for the classification of both the defec¬ 
tives and inhibitors in the 1-inhibitor model, it is also valid for classification 
of inhibitors alone. This is because the entropy in the system is dominated 
by the number of inhibitors, in the large inhibitor regime. 

















11 


the number of tests required to identify the inhibitors alone. 
Though the inhibitors outnumber the defectives, they can be 
identified only in the presence of an associated defective. So, 
the worst scenario (in terms of number of tests) happens when 
most inhibitors have to be identified using a single defective, 
or in other words, all of the inhibitors happen to inhibit 
a single defective. The third lower bound in the following 
theorem exploits the intuition gained from Step 2 of the 
decoding algorithm for non-adaptive pooling design (given in 
Section[In]). This lower bound is obtained by characterizing the 
minimum number of tests required to identify the associations 
of every defective. Since no two defectives might be associated 
with a single inhibitor, it is necessary that no two defectives 
participate in the same test from which the associations 
of a defective are identified. Otherwise, the non-associated 
defective masks the effect of association of the associated 
inhibitor-defective pair. This might result in wrongly declaring 
the associated inhibitor-defective pair to be non-associated. 

Throughout this section, lowercase alphabets are used for 
defectives and inhibitors whose realizations are revealed by 
a genie and uppercase alphabets are used for those whose 
realizations are unknown. 

Theorem 4: An asymptotic lower bound on the number of 
tests required for non-adaptive pooling designs for solving the 
IDG-NSI problem with vanishing error probability for r, d « 
o{n ) is given by 

max |f] ((r + d) logn + rd) , Q — logn^ , f)(d 2 ) j . 

Proof: The proof for the first lower bound on the number 
of tests follows by lower bounding the total number of possible 
realizations of the sets of inhibitors, defectives, and association 
patterns. 

T na >H(I,V,S(I,V)) 

= H(V) + H (X\V) + H(£(l, V) | (X, V)) (16) 



= Q ((r + d) log n + rd) , (17) 

where ij denotes the number of defectives that the j th -inhibitor 
can be associated with, and the last step follows by using 

Stirling’s lower bound (t) > 2 *. This lower bound on the 

' 2 

number of tests is also valid for adaptive pooling designs. 

The second lower bound for non-adaptive pooling designs is 
obtained as follows. Assume that it is required to identify the 
inhibitors alone. Clearly, this requires lesser number of tests 
than the problem of identifying the association graph. Since 
the objective is to satisfy the error metric in ([!]), the error 
probability criterion 

Pr (j ^ j) < cn~ s (18) 

has to be satisfied for all possible association patterns £ 
on all possible realizations of (X, V). Let PD-DA denote 


a pooling design, decoding algorithm tuple that satisfies 
( fi~8] ), and denotes the set of all such tuples. Further, let 
Tna(PD-DA,T,V,£) denote the minimum number of tests 
required by PD-DA to satisfy ( [18] ) for a particular realization 
of (X, T>,£). We now have to determine the lower bound 
inf sup T/va- We now have 

inf sup T/va > inf sup T/va 5 

& (X,V,£) & (Z,£>,£') 

where £' denotes a specific class of association pattern repre¬ 
sented in Fig. [ 6 ] Now, assume that a genie reveals the set of 


X T> 



Fig. 6 : The class of association pattern used to obtain the 
second lower bound, illustrated for some realization of (X, V). 
A single defective is associated with all the inhibitors, but none 
of the other defectives are associated with any inhibitor. 

defectives = {^ 2 , • • * which are not associated with 
any of the inhibitors. A lower bound for this problem with 
side information from the genie is clearly a lower bound for 
the original problem. A lower bound on the number of tests 
T' na for this problem is given by 113 ^] 

r f = (log (";“))■ 

Note that the presence of any defective from the set & in a 
pool always gives a positive outcome, and hence provides zero 
information for distinguishing the inhibitors from the rest of 
the items as the entropy of such an outcome is zero. So, we 
assume that none of the tests contain items from the set &. 
Therefore, the inhibitor identification problem for items with 
the association pattern as given in Fig. [ 6 ] is now reduced to 
the problem of identifying r inhibitors amidst n — d normal 
items and one defective item in the 1 -inhibitor model, where 
d = o{n). For this problem, using Theorem [3j it follows 
that the lower bound on the number of tests is given by 
T' na = Q ( 1^77 log ? 7 ^. Hence, this is also a lower bound on 
the number of tests required to identify the association graph 
with vanishing worst case error probability. 

5 A similar expression is used to obtain Theorem 3. This is derived formally 
using Fano’s inequality. The steps involved are illustrated in the proof of the 
third lower bound. 
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The evaluation of the third lower bound is involved and is 
obtained as follows. Since the second lower bound is tighter 
when r > d, here we assume that r < d. Using similar 
arguments as in the second lower bound, a lower bound on the 
number of tests for the following reduced problem is a lower 
bound for the original problem. Let {S 2 , • • • , S r } denote a 
set of inhibitors associated with exactly one defective U d - The 
defective Ud is not inhibited by the inhibitor Si. Further, the 
inhibitor Si is associated with exactly one of the defectives 
{J7i, ♦ • * , Ud- 1 }. This association graph is depicted in Fig. [7] 
Let a genie reveal the set of inhibitors — {<§ 2 , • * • , s r } 


x v 



Fig. 7: A possible association graph, where r — 1 inhibitors 
and a single defective are associated only among themselves. 
The remaining inhibitor is associated with exactly one of the 
remaining defectives. 


and the defective u d . The “residual message” in the system 
is now given by W' = {Si ,V_ Ud ,£(S 1 ,V_ Ud )h where 
V_ Ud 4 V\u d . 

For the reduced problem, we have 

S I £> aX Pr | {Si,V-u d ,£ {Si, P_ u<i )) (19) 

£ ( s i' c -<‘d) 


> E f 


^{S u V_ Ud ,£{S u V_ Ud ))} 

Pr (s lt T>-ud)) (20) 

^(S 1 ,V_ Ud ,£(S 1 ,V_ Ud ))}]^P eavg , 


where / denotes a probability mass function of the association 
graph such that 


Pr{(si,u) e £ (si,^- Ud )} = 

Prj^i, 


1 


Ud/J d- 1 

1 


? u G 


( 21 ) 


id-din - r - d + 1) 


for any realization of (Si,X>_ Ud ) given by So, 

a lower bound on the number of tests required to achieve 
vanishing average error probability P eavg in ( J20| ) is also a 
lower bound on the number of tests required to achieve 
vanishing maximum error probability in ( [T9] ). These in turn 


give a lower bound on the number of tests for the original 
problem. 

Using Fano’s inequality, we hav^J 


H[£ (Si,£>_ Ud ) |Si,£>_ Ud , «/_ 5lJ w d ] 


1 

(H )(n-r-d+l) 


^ log(d - 1) = log(d - 1) 

Si,V- Ud 

( 22 ) 


< 1 +P e H[£(S u V_ Ud )\S 1 ,V_ Ud ,^_ Sl , u d ] 

+1[£ (Si,T>- Ud ) ;y\Si,V- Ud ,S- Sl ,u d ] 

< 1 + P e log(d — 1) + i? \y\Si,V_ Ud , j£_s 1 , u d\, (23) 


where (22) is obtained using (21), 


and P e = 
denotes the 


Pr{ £{S u V- Ui ) * £{S l ,V. Ud )} < P eavg 

average error probability in declaring the residual association 
patterrj^] The summation term in ( |22| ) denotes summation 
over all possible realizations of Si,V- Ud . Using the fact that 
conditioning reduces entropy in ([23]), we have 


Tna 

^2H\y(l)\S u V. Ud ,^ Sl ,u d ] 

1 = 1 

> (l-Pe)log(d-l)-l. (24) 

The presence of items from the set {^-Sx^d} in a test can 
either reduce the entropy or leave the entropy of the test 
outcome unaffected. So, we consider only pooling designs that 
do not contain any item from the set {^-Sx^d}- Therefore, 
the entropy of a test is dependent only on the realization of 
Si,V\u d , i.e., 

H[y(l)\S 1 ,V_ Ud ,j£- Sl ,u d ] 

=J? [y(Z)|S' 1 ,X>_ Ud ] 

= £ Pv{ Sl ,^ Ud }H[y(l)\ Sl ,^ Ud }. 

Si,S>-u d 


Suppose that we are given a pool of gi items for the / th test. 
The entropy of the / th test outcome is non-zero only for those 
realizations of S \, V\u d for which the / th pool contains exactly 
one defective and the inhibitor. This is because, otherwise, 
there is no randomness in the test outcome. There are gi(gi— 
1)(V_70 suc h possible realizations for 2 < gi < (n — r — 
d + 2), and none for gi = 0,1 and for gi > (n — r — d + 2). 
For each of these realizations, with 2 < gi < n — r — d + 2, 
the entropy of the test outcome is given by 




d- 1 


log (d - 1) 


d -2 
d — 1 


log 


d- 1 
d -2 


6 For brevity, we omit the conditioning on 8 (j r -s 1 ,Ud), which is also 
revealed by the genie, in the entropy and mutual information terms. 

7 The inequality holds because E h (Si,V- Ud ) / S (5i, V- Ud ^ j 

<e[i [(s lt T>- Ud ,£{SuV- Ud ))t{SuV- Ud ,£{SuV- Ud )))}, 
where E[.] and I(.) represent the expectation operator and the indicator 
function respectively. 

















13 


Therefore, we have 

H[y{l)\S^V_ Ud 

1 


< 


_1_ 2f n ~ r ~9i\ , 

(nK^-^+oH ^-2 ■ 


The term dependent on gi is re-written as 


9i 


2 n-r-gi 


d—2 


(d- 2)! 




where 


d-3 


i=o 

v 


n — r — g — e — j 
d-2 

> ( 1 


> 1 + - 
9 


n — r — g — e 


1 H- 

n — r — g — e 

g{d ~ 2) 


d-2 


2e 

> e 9 


2e 


In 1 


> 1 


1 


1 - 


> 1 , 


d-2 


approximation ln(l + x) ^ x in (26), for x << 1. 


To ensure f(g — e) < f(g) for all g < ^ — 4 = g 2 , it is 
required that 

d—3 d -3 

(5 - «) 2 n _ r - 9+e _ ■?) < 5 2 n ~ r - 9 - 3) 

3 =0 J=0 

(27) 


(25) 


d-3 / 

^n(i 

J=0 v 

1 + 


= exp 


n-r-g-j 
e 

n — r — g — d + 3 
{d — 2)e 


< 1 + 


d—2 


9-£ 


< 1 + 


< 1 


9 - e 
2 


9-£ 


f(9i) ~9i Wdi-r-gi - j). 

3= 0 

We now maximize the above term with respect to gi G [2 , n — 
r — d + 2] to obtain a lower bound on the number of tests. 
The following lemma gives the approximate optimum value 
of gi (denoted by g opt ). It is shown that f(g t ) > f(g t + e) 
for all gi > g opt and 0 < e < 1, and f(g t - e) < f(gi ) for 
all gi < g op t • Since g opt is independent of l, hereupon the 
subscript l is dropped. It must be noted that (g + e), (g — e) G 
[2, n — r — d + 2]. 

Lemma 1: There exists no so that for all n > no, the opti¬ 
mum value of g that maximizes f(g) is given by 
where k = o(^). 

Proof: To ensure f(g) > f(g + e) for all g > 
and 0 < e < 1, it suffices that 

d-3 d-3 

g 2 Y[(n-r-g-j ) > (g + e) 2 JJ(n - r - g - e - j) 

3=0 j =0 

d-3 , 

-nf 1 


^n — r — g — d + 3 

2(n - r - - d + 3) In (l + -O 

^- (d — 2)e > L (28) 

Since the above function is decreasing in g , it is sufficient to 
prove that the above inequality is satisfied for g = g 2 . In order 
to satisfy ( [28] ) for all n > no and some finite positive integer 
no at g = # 2 , it suffices that 

2 (n — r — #2 — d + 3)e 


a 


(02 -e){d- 2)e 

+ n) 


n d n n 


0- 

a- 


d(4+e) 
2 n 


)(i-D 


> 1 


> 1 


(29) 


d 


!-a 


+ 7) 

d n n' ^ -| 

2d+0.5e 4+e ^ ’ 


which is true because r The inequalit y |29| ) is obtained 
by using the approximation ln(l + x) ^ x in (28), for x « 1. 


Therefore, we have g opt G 


+ fc, for fc = o (jffij* 


2n ^ 


2 n _ 4 


2 n 
’ d—2 


, and so gopt — 


From ( [25] ) and ( |24| ), using the approximation h ~ ^ log d, an 
asymptotic lower bound on the number of tests for vanishing 
error probability is given by 

,(n)( n - r - rf+i A 


TNA - n { d - gi Pt rrr) )' 

We now show that the fractional term above scales as d. 

( n d I[)( n -r-d + 1) ( n d Z r 1 )(n-r-d + 1) 


(30) 


(26) 


2 

VoptV d—2 / 


4n 2 / n—r—g op t\ 
d 2 V d—2 / 


nf=o(n-r-i)(n-r-d+l) 


Since the above function is increasing in g , it is sufficient to 
prove that the above inequality is satisfied for g = g\. Note 
that, since r = o{n) and d —> oo, we have n — r — g\ = 
fi(n). So, in order to satisfy (26) for all n > no and some 
finite positive integer no at g = gi, it suffices that 


4n 2 

o 

d 


^r( d ~ (n-r- g opt - i) 

d-3 

n 


n — r — i 


4 t=o n - r - Gopt - 1 


(31) 


d 


d-3 


no 


2=0 


9 opt 


n-r- g opt - i 
d—2 


9opt 


n-r- g opt 


g 9opt(d 2 ) 
— g n —i gopt 

4 


d-. (32) 


which is true. The above inequality follows by using the 


8 Recall that this was assumed at the beginning of the proof of the third 
lower bound. 
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It must be noted that the ratio notion of approximation does not 
affect the scaling of the number of tests. The approximations 
in © and ( [32] ) make use of the fact that r.d = o(n ) and 
9opt = +%). Therefore, from (30) and (32), we have 

t na = n(d 2 ). m 


The lower bounds for the IDG-WSI model are obtained in 
the following theorem. Since we are interested in asymptotic 
lower bounds, we assume that the limits lim and 

n—> 00 r 

lim - 7 j 1 -— exist, and I max —> 00 . The ideas used to obtain 
the following theorem are similar to those used in Theorem [4] 
However, the “second constraint” (mentioned in the proof of 
the following theorem) needs to be accounted for. 

Theorem 5: An asymptotic lower bound on the number of 
tests required for non-adaptive pooling designs for solving the 
IDG-WSI problem with vanishing error probability for r, d = 
o(n) is given by 


max 


jft ((r + d) log 71 + Imaxd) , ft 


I 2 

max 


VOg Irr 



An additional asymptotic lower bound is given by Q(d 2 ) when 
either 1) r = (c — l)d+ kd, for some constant 0 < k < 1 and 
Imax = c or 2) r = (c — l)d + k and (c — 1 )d < r < cd, for 
positive integer k = o{d) and Imax = c or 3) (c — l)d < r < 
cd and Imax ^ c H - 1 • 

Proof: The first lower bound is obtained by lower bound¬ 
ing H (£(T,V)\(Z,V)) in (16) as follows. Two constraints 
need to be satisfied while counting the entropy of association 
pattern. 


• First constraint: Minimum degree of a vertex in T is one. 

• Second constraint: Maximum degree of a vertex in V is 
no more than I max . 


We now consider the three possible cases below and show 
that in each of the cases the lower bound on the number 
of association patterns scales exponentially in I m axd. Let 
(c — l)d < r < cd, for some positive integer c, and so I max > 
c. Define a\ = lim , c and = lim I^ax .. 

n—> oo lrnax n—> 00 r 

Case 1: ol\ < 1 and a 2 < 1. There exist positive constants 
Pi < 1 and p 2 < 1 so that c < PXmax and I ma x < P2 A Vn > 
no . Define an association pattern, where each defective starting 
from u% is assigned a disjoint set of c inhibitors until every 
inhibitor is covered. Therefore we have, max \T(ui)\ < c. 

Ui (zT) 

Since the first constraint is satisfied, each defective is now free 
to choose an association pattern so that max \T{ui)\ < Imax- 

Ui 

The number of such possible association patterns can be lower 
bounded by 



Thus, the entropy of association pattern in this case scales 
(asymptotically) as the logarithm of the above quantity, which 
is given by Q(I max d). 

Case 2: < 1 and 0^2 = 1. There exist positive constants 

Pi < 1 and P2 < 1 with P2 > Pi so that c < Pil m ax and 
Imax > p 2 r, Vn > n 0 . So, we have Imax ~ c > {P 2 — Pi)r, 
Vn > no. Using similar arguments as in Case 1 , where after 
satisfying the first constraint, — c inhibitors are chosen to 


associate with each defective, we now have that the entropy 
of association pattern in this case scales asymptotically as 

U(rd) — 0(/ macc d). 

Case 3: a\ = 1. Note that this case constitutes a large 
inhibitor regime with respect to the number of defectives 
(because Imax —^ 00 ). There exists a positive constant Pi <1 
so that c > Pilmax, Vn > no. The number of ways of 
assigning each defective to a disjoint set of (c — 1) inhibitors 
is given by 

/ r \/r-(c-l)\/r-2(c-l)\ fr - (d - l)(c - 1)\ 

\c-y\ (c- 1 ) A (c- 1 ) J 'V ) 

r! 

((c — l)!) d (r — d(c — 1))! 

_ V2 ^r r+ ^e~ r _ 

(a) e 2 (c - i)d(<=-i+|) e -<i( c -i )(r - d(c - e-( r - d ( c ~ 1 '>'> 

_ VMdjc- i))rf (c ~ 1)+ i e -rfc _ 

(b) e 2 (c - l) d ( c - 1 + 5 ) e -ci(c- 1 )( r _ _ l)) r - d ( c -l) + 5 e -(r-d(c-l)) 

\/2n(d(c — l)) d ( c-1 )+5g -dc 
(7) e 2 (c-l) d(c_1+ 5 

a/27t(c — l)5d d(c_2) _ \ / 27t(c — l)id d(c_2) 

e 2 (c— l)^g d e 2 e j dl °g<^( c —Hgjcilog^e 

= ^ (c - 1) iA c - l0g<ft2 °~ 1) “ logd e “ 2 ), 

where (a) follows from Stirling’s lower and upper bounds for 
factorial functions, ( b ) and (c) follow from the fact that d(c — 

1) < r < cd. Observe that the remaining r — d(c— 1) inhibitors 
can be assigned one each to one defective without violating 
the second constraint. Thus, the entropy of association pattern 
in this case scales asymptotically as Q(cd) = Q(PiI m axd) = 

Imaxd')- 


X V 



Fig. 8 : A possible association pattern where, without loss of 
generality, u\ is assumed to be a defective for which \T(u\)\ = 
Imax • The set of inhibitors and defectives that are associated 
only among themselves (which the genie reveals) are inside 
the dotted ellipse. 

The second lower bound is obtained as shown below. There 
could exist at least one defective u\ E V so that \T(ui)\ = 
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Imax • Consider an association pattern where Z(u\) C\X(uk) = 
{0}, for Uk G T>, k 7 ^ 1, as depicted in Fig. [ 8 ] Now, we use a 
similar argument as in the proof of the second lower bound in 
Theorem [4] Let a genie reveal the inhibitor subset X — X(u\), 
the defective subset V — u\ and their associations. Now, none 
of the items from the sets Z — Z(u\) and V — u\ is useful in 
distinguishing the inhibitors in the set Z(ui) from the unknown 
defective and the normal items. This is because the entropy of 
an outcome is zero if the test contains some defective from V— 
u\ but none of its associated inhibitors (which are only from 
the set Z — Z(u\)) as such a test outcome is always positive. 
The entropy of an outcome does not change if any of the 
inhibitors Z — Z{u\) with or without its associated defectives 
(which are only from the set V — u{) is present in the test. 
Thus, the problem is now reduced to the 1-inhibitor problem 
of finding Imax inhibitors amidst n — (r — I max ) ~ (d — 1 ) 
normal items and one (unknown) defective. A lower bound on 
the number of non-adaptive tests for this problem is clearly a 
lower bound on the number of tests for the original problem 
of determining the association graph for the IDG-WSI model. 
Since r,d = o(n) and Imax —> 00 , using Theorem [3| we 

n—too 

get the lower bound Q ^ l0 g7 aa! log nj • 

The third lower bound is obtained below for the case where 
r = (c — l)d + kd, for some constant 0 < k < 1 and 
Imax = c. The proof for the other two cases mentioned in 
the statement of the theorem are similar. Parts of this proof 
are similar to the proof of the third lower bound in Theorem |4} 
and hence we only point out the differences in this proof. As 
in the proof of Theorem [4j we consider a reduced problem as 
follows. As depicted in Fig. [9] a specific class of association 
graph is considered, where disjoint sets of c — 1 inhibitors 
{Xi,X 2 , • • • ,2^} are associated with one defective each, i.e., 
each item in the set Zi with \Zi\ = c — 1 is associated 
with the defective Uu for i = 1, • • • , d. Each item in the 
set of inhibitors X d+1 = {S' (c _ 1)d+1 , • • • , S' (c _i )d+A:d _i} is 
associated with one distinct defective with which the sets of 
inhibitors {Zi, • • • ,Zkd-i} are also associated, i.e., S^c-nd+j 
is associated with the defective Uj, for j = 1, * • • ,kd— 1. The 
remaining inhibitor S r is associated with exactly one of the 
defectives in the set T>s r = {Ukd, mmm , Ud}- It is now easily 
seen that the first constraint is satisfied, and \Z{Uj)\ < c for 
all j , which means that the second constraint is also satisfied. 
Now, let a genie reveal the realizations of {Zi,Z 2 , • • • ,2^ + 1 } 
and {Ui, • • • ,Ukd-i}, given by J = {/ 1 , J 2 ,--* ,/<*+ 1 } and 
3>s r — {u\, • • • ,Ukd-i} respectively. The association pattern 
between them given by ^s r ) i s a i so revealed. 

The “residual message” in the system is now given by W\ = 
{Sr,Vs r ,£(Sr,V Sr ),£({Ikd,' ■ • ,/d},2> 5r )}. We now show 
that determining W\ = £(S r ,Vs r ) itself requires order of 
d 2 tests. It is easy to see that there is no reduction in the 
mutual information I [Wi ; y| S r , T*s r , S>s r ,£(*#, @s r )\ if 
the items in the set , Ikd-i, Id+i, ^ s r } do not par¬ 

ticipate in any of the tests. So, we assume hereon that 
these items do not participate in any of the tests, and thus 
denote the rest of the items which participate in the tests by 

W' ±AT\J{I kd ,--- ,Id,S r }\jT> Sr . 


For the reduced problem considered, we have 


max Pr jjPi ^ > E f Pr jvFi ^ Wi j 


A p 
— r 


where / denotes some probability mass function of the 
residual association graph. Let /1 and / 2 denote independent 
probability mass functions of the residual association patterns 
£(s r , &> Sr ) and £({Ikd, * * * ? Id}, ^s r \ for an y realization of 
(S r ,T>s r ) given by (s r , 0 Sr )- The function / 2 is such that 

Pr{(s r ,u) G £(s r ,3> Sr )} = + 1 ,Vm G S> Sr , (33) 

for any realization (s r ,@ Sr ). Also, it is assumed / is such 
that the realizations of (S r , T>s r ) are uniformly distributed 
across the rest of the items, i.e., occurrence of every realization 
happens with probability . 

V d(i-fc)+i )y n r 

Let M be the test matrix which is known a priori. Also, let 
the matrix Mi denote the test matrix M whose columns are 
restricted to the items W'\{Ikd, • • * , Id}, and the matrix M 2 
denotes the test matrix M whose columns are restricted to the 
items W'\{s r }. Denote the “virtual outcome vector” obtained 
by testing the items using the matrices Mi and M 2 by 
yi (£(s r , S> Sr )) and y 2 (£({hd, ■■■ , h}, @ Sr )) respectively 
Note that y = yi.y 2 , i.e., the actual outcome vector is equal 
to component-wise Boolean AND of the two virtual outcome 
vectors for every realization (s r ,@ Sr ). Since £(s r ,@ Sr ) and 
£{{Ikd, • * * , Id}, & Sr ) are statistically independent messages, 
using data-processing inequality, we have 


I [£(s r , gisdwlsr, @s r , {hd, ■ ■ ■ Jd}\ (34) 
< I [£(s r , S>s r )] yi|s r , S>s r ] ■ 


Now, applying Fano’s inequality, we have 

,i d }} 


H[£(S r ,V Sr )\S r ,V Sr ,{Ikd,- 

1 

~ (”«["?«)(“ -r-d + l) S ^ s 

< 1 + P./f[f(S.,I> Sr )|S r ,X>s„{^w, 

+ I [£{S r ,'Ds r )',y\S r ,'Ds r , {Ikd, 


log(d(l -k) + 1) 


■k}} 

dd}} 


< 1 + P e log(d(l — k) + 1) + 


1 


fn—r—kd+2\ / 
V d(l-fe) + l A 


Tl — 


r — 


d + 1) 


x 


Y 1 i £ ( S n' D S r );yi\{Sr, / Ds r } = {s r ,@ Sr }] 

S r ,Vs r 


(35) 


< 1 + Pe log(d(l — &) + !) + 


1 


(n—r—kd- f2\ / 

l d{l-k) + l )\ 


n — 


r — 


d + 1) 


x 


Y H [yi\{Sr,V Sr } = {s r ,% r }}, 

Sr^S r 


where P e = Pr {£(S r ,V Sr ) ^ £(S r ,V Sr )} < P eavg and^]) 
follows from ( [34]) . Now, following similar steps after ([23|m 
the proof of Theorem |4j we have the lower bound offi((d(l — 
k) + l) 2 ) = Q(d 2 ) tests. ■ 


9 The arguments of the virtual outcome vectors denote that the vectors are 
functions of their arguments. 
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X 



Fig. 9: Association graph with realizations {/i, • • • , Id: Id+ 1 ,^ 1 , • • • , considered for obtaining the third lower bound 

for the IDG-WSI model. The genie reveals the realizations {/ 1 , • • • , Ikd-i, Id+i} along with their association pattern with the 
realizations {u±, * • • , Ukd- 1 }. It also reveals the realizations {Ikd, ■ • • , id} which are known to be associated with the unknown 
realization of the remaining defectives V$ r . It is also known that the unknown inhibitor S r is associated with exactly one of 
the defectives in the set T>s r . Such an association graph is chosen so that the constraint I m ax = c is not violated. 


Thus, in the d = 0(I max ) and d = 0(r) regimes, the upper 
bound on the number of tests for the proposed non-adaptive 
pooling design is away from the proposed (second) lower 
bound for the IDG-WSI and IDG-NSI models by log Imax and 
logr multiplicative factors respectively. In the Imax = o(d) 
and r = o(d) regimes, the upper bounds exceed the proposed 
(third) lower bounds by logn multiplicative factors for both 
the IDG models, with some restrictions on I max or r in 
IDG-WSI model. When these restrictions are removed, the 
evaluation of the lower bound might require consideration of 
other association graphs like in Fig. [2j as an extension of the 
association graph used in proof of the third lower bound in 
Theorem [5] But even for the graph in Fig. [2j the optimization 
of the entropy over the pool size becomes combinatorially 
cumbersome. We thus relegate the evaluation of lower bound 
for the unconstrained IDG-WSI model to future work. For 
the proposed two-stage adaptive pooling design, the upper 
bound on the number of tests is away from the proposed (first) 


lower bound by log n multiplicative factors for both the IDG- 
WSI and IDG-NSI models in all regimes of the number of 
defectives and inhibitors. 

V. Conclusion 

A new generalization of the 1-inhibitor model, termed IDG 
model was introduced. In the proposed model, an inhibitor 
can inhibit a non-empty subset of the defective set of items. 
Probabilistic non-adaptive pooling design and a two-stage 
adaptive pooling design were proposed and lower bounds on 
the number of tests were identified. Both in the small and 
large inhibitor regimes, the upper bound on the number of 
tests for the proposed non-adaptive pooling design is shown 
to be close to the lower bound, with a difference of logarithmic 
multiplicative factors in the number of items. 

For the proposed two-stage adaptive pooling design, the 
upper bound on the number of tests is close to the lower bound 
in all regimes of the number of inhibitors and defectives, 






17 


the difference being logarithmic multiplicative factors in the 
number of items. 

Future works could include more practical versions of the 
IDG model, such as taking the following considerations into 
account. 

1) Cancellation effect of the normal items on the inhibitors. 

2) Partial inhibition of expression of defectives by the 
inhibitors, which also naturally embraces the presence of 
inhibitors in the semi-quantitative group testing model 



3) Inclusion of the k -inhibitor model, for unknown fc, as a 
part of the association pattern in the IDG model. 

Obtaining lower and upper bounds on the number of tests 
for the aforementioned variants of the IDG model along with 
inclusion of noisy tests should be more involved and worth 
pursuing. 
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